5 - Quality Control

Author

CDN team

Published

Last compiled on 20 May, 2025

Quality Control

Introduction

This notebook reviews the quality control steps recommended at the beginning and at each subsequent annotation step when analyzing scRNA-seq. To clean up the cell by gene matrix, we aim to remove technical artifacts such as doublets or empty droplets. In this notebook, we will look at UMI and gene counts, filter out cells with high mitochondrial levels, investigate potential doublets (using DoubletFinder), and compare the cell by gene distribution before and after filtering.


Key Takeaways

  • QC is the process of filtering out “bad” data. This makes it a subjective process, making well-informed decisions is key to avoid removing biological signal. In this context, being conservative in what we call “poor quality” cells is best practice.

  • QC metrics include UMI and gene counts, mitochondrial gene expression, and doublet detection.

  • Looking at QC metric individually is not enough to grasp the whole picture. We need to look at how these metrics covary to make informed decisions.

  • The main pitfall in this step is to be too stringent. This may lead to removing true biological signal. Some examples of cells that might be filtered out during the QC process are neutrophils, platelets, and erythrocytes since they have low library size and complexity.

  • Doublets can be confounded with transitioning cells or cells with mixed phenotypes. Therefore, it is very important that we are very careful when calling a cell a doublet! Otherwise, we might be removing very interesting biology.

  • When in doubt, keep it. Once you remove something, you’ll never get it back.

Why are mitochondrial genes highly expressed and found in many datasets?

There is a disproportionate number of mitochondrial-high cells because cells upregulate mitochondrial genes during stress (such as during library prep) and near death.

From Orchestrating single-cell analysis with Bioconductor:

“High proportions (of mitochondria) are indicative of poor-quality cells (Islam et al. 2014; Ilicic et al. 2016), presumably because of loss of cytoplasmic RNA from perforated cells. The reasoning is that, in the presence of modest damage, the holes in the cell membrane permit efflux of individual transcript molecules but are too small to allow mitochondria to escape, leading to a relative enrichment of mitochondrial transcripts.”

Glossary

Doublet:

A doublet is a technical event when two or more cells end up within the same droplet. These cause confusion in the gene expression distribution especially when cells of two different cell types (different expected gene expression profiles) are in the same droplet.

Heterotypic doublets vs Homotypic doublets

A heterotypic doublet is a doublet with cells from distinct gene expression profiles that are likely different cell types.

A homotypic doublet is a doublet with cells with similar gene expression profiles that are not necessarily different cell types but are suspicious because they express around double the amount that other cells do.

Resources

  1. Current best practices in single-cell RNA-seq analysis: a tutorial

  2. DoubletFinder

  3. Bioconductor: Orchestrating single-cell analysis with Bioconductor

  4. Single Cell Best Practices

  5. Demuxafy

    A software that can access most popular demulitplexing and doublet detection tools. I used it as a review and consolidation of resources.

Loading Libraries and Data

if (!requireNamespace("tidyverse", quietly = TRUE))
    install.packages('tidyverse')
if (!requireNamespace("Seurat", quietly = TRUE))
    install.packages('Seurat')
if (!requireNamespace("colorBlindness", quietly = TRUE))
    install.packages('colorBlindness')
if (!requireNamespace("RColorBrewer", quietly = TRUE))
    install.packages('RColorBrewer')
if (!requireNamespace("DoubletFinder", quietly = TRUE))
    remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
if (!requireNamespace("ggridges", quietly = TRUE)) {
  install.packages("ggridges")
}
library(Seurat)
library(tidyverse)
library(RColorBrewer)
library(colorBlindness)
library(DoubletFinder)
library(ggridges)

set.seed(687)
# Load Seurat object
se <- readRDS('../data/workshop-data.rds')
se
An object of class Seurat 
33694 features across 22502 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 2 layers present: counts, data
# Set color palette for cell types
donor_pal <- c(
    "#66C2A4", "#41AE76", "#238B45", "#006D2C",
    "#41B6C4", "#1D91C0", "#225EA8", "#253494",
    "#FD8D3C", "#FC4E2A", "#E31A1C", "#BD0026",
    "#ad393b", "#800000", "#800050")

names(donor_pal) <- c(
    "T024", "T036", "T44", "T057", "T110", "T160", "T161", "T182",
    "T017", "T019", "T176", "T189", "T197", "T203", "T202"
)

Let’s set the levels for sample_id so they appear in order

se@meta.data <- se@meta.data %>%
    mutate(sample_id = factor(`Sample name`, levels = names(donor_pal)))

Look at overexpressed genes

Let’s look at the number of counts per barcode, the number of genes per barcode, and the percentage of mitochondrial genes per barcode. These distributions will help us determine the quality of the data by allowing us to see any outliers. These outliers could be dying cells with high mitochondrial counts or doublets with high gene counts.

In public datasets, use colnames() to find the processing and metadata already analyzed.

# Correct calculation of the number of metadata variables
vars <- length(colnames(se@meta.data))
# Use paste0 for concatenation without spaces
print(paste0("There are ", vars, " metadata variables in the seurat object."))
[1] "There are 17 metadata variables in the seurat object."
# Sample the first 10 metadata variables
colnames(se@meta.data)[1:10]
 [1] "orig.ident"         "nCount_RNA"         "nFeature_RNA"       "Age"                "Diagnosis"          "Sample name"        "Type"               "batch"              "doublet_scores"     "predicted_doublets"

Library Size

nCount_RNA is the number of UMIs per cell. This is a measure of the total number of transcripts detected in a cell and is the way to visualize library size. Cells with low nCount_RNA values may be dying cells or biologically relevant cell types with low mRNA content such as neutrophils. Cells with high nCount_RNA values may be doublets or very large cells like macrophages. Note the figure below plots UMIs on a log10 scale.

ggplot(se@meta.data, aes(x = nCount_RNA)) +
    geom_density(color = "#6abcb6", fill = "#6abcb6", alpha = 0.7) +
    scale_x_continuous(
        transform = "log10",
        labels = scales::unit_format(unit = "K", scale = 1e-3)) +
    theme_classic()

Plotting the UMIs by cell type shows that if we were to make a hard cut off at 500 UMIs, we would lose a large portion of platelets and RBCs.

ggplot(se@meta.data, aes(x = nCount_RNA, fill = sample_id)) +
    geom_density(alpha = 0.7) +
  # Hard cutoff at 500 elicits loss in important cells
  geom_vline(xintercept = 500, linetype = "dashed", color = "red") +
    scale_x_continuous(
        transform = "log10",
        labels = scales::unit_format(unit = "K", scale = 1e-3)) +
    theme_classic() +
  scale_color_manual(values = donor_pal)

Library Complexity

nFeature_RNA is the number of genes detected in a cell. Similarly, we want to look for cells with significantly low or high feature count. The feature count distribution gives us an idea of library complexity while UMI counts looks at library size.

ggplot(se@meta.data, aes(x = nFeature_RNA)) +
    geom_density(color = "#6abcb6", fill = "#6abcb6", alpha = 0.7) +
    scale_x_continuous(
        labels = scales::unit_format(unit = "K", scale = 1e-3)) +
    theme_classic()

A cell with a very high number of genes per cell could be indicative of doublets or large cells. Cells on the left side of the distribution with low gene counts could be low quality cells or just RBCs or platelets.

Percentage of mitochondrial genes

Notice the object already has Percent of mitochondrial genes calculated. We are going to calculate them again to show the process.

Seurat’s PercentageFeatureSet function calculates the percentage of a feature set in each cell. This function is useful for calculating the percentage of mitochondrial genes, ribosomal genes, or any other gene set of interest using regular expression (string matching) to filter these out by gene name.

# Calculate the percentage of mitochondrial genes
rownames(se)[str_detect(rownames(se), "^MT-")]
 [1] "MT-ND1"  "MT-ND2"  "MT-CO1"  "MT-CO2"  "MT-ATP8" "MT-ATP6" "MT-CO3"  "MT-ND3"  "MT-ND4L" "MT-ND4"  "MT-ND5"  "MT-ND6"  "MT-CYB" 
se <- PercentageFeatureSet(se, pattern = "^MT-", col.name = "perc.mt")
# Calculate the percentage of ribosomal genes
rownames(se)[str_detect(rownames(se), "^RP[S|L]")]
  [1] "RPL22"          "RPL11"          "RPS6KA1"        "RPS8"           "RPL5"           "RPS27"          "RPS6KC1"        "RPS7"           "RPS27A"         "RPL31"          "RPL37A"         "RPL32"          "RPL15"          "RPSA"           "RPL14"          "RPL29"          "RPL24"          "RPL22L1"        "RPL39L"         "RPL35A"         "RPL9"           "RPL34-AS1"      "RPL34"          "RPS3A"          "RPL37"          "RPS23"          "RPS14"          "RPL26L1"        "RPS18"          "RPS10-NUDT3"    "RPS10"          "RPL10A"         "RPL7L1"         "RPS12"          "RPS6KA2"        "RPS6KA2-AS1"    "RPS6KA3"        "RPS4X"          "RPS6KA6"        "RPL36A"         "RPL36A-HNRNPH2" "RPL39"          "RPL10"          "RPS20"          "RPL7"           "RPL30"          "RPL8"           "RPS6"           "RPL35"          "RPL12"          "RPL7A"          "RPLP2"          "RPL27A"         "RPS13"          "RPS6KA4"        "RPS6KB2"        "RPS3"           "RPS25"          "RPS24"          "RPS26"          "RPL41"          "RPL6"           "RPLP0"          "RPL21"          "RPL10L"         "RPS29"          "RPL36AL"        "RPS6KL1"        "RPS6KA5"        "RPS27L"        
 [71] "RPL4"           "RPLP1"          "RPS17"          "RPL3L"          "RPS2"           "RPS15A"         "RPL13"          "RPL26"          "RPL23A"         "RPL23"          "RPL19"          "RPL27"          "RPS6KB1"        "RPL38"          "RPL17-C18orf32" "RPL17"          "RPS21"          "RPS15"          "RPL36"          "RPS28"          "RPL18A"         "RPS16"          "RPS19"          "RPL18"          "RPL13A"         "RPS11"          "RPS9"           "RPL28"          "RPS5"           "RPS4Y1"         "RPS4Y2"         "RPL3"           "RPS19BP1"      
se <- PercentageFeatureSet(se, pattern = "^RP[S|L]", col.name = "perc.ribo")
# Calculate the percentage of hemoglobin genes
rownames(se)[str_detect(rownames(se), "^HB[^(P)]")]
 [1] "HBEGF" "HBS1L" "HBB"   "HBD"   "HBG1"  "HBG2"  "HBE1"  "HBZ"   "HBM"   "HBA2"  "HBA1"  "HBQ1" 
se <- PercentageFeatureSet(se, pattern = "^HB[^(P)]", col.name = "perc.hb")
# Calculate the percentage of Immunoglobulin genes
rownames(se)[str_detect(rownames(se), "^IG")]
  [1] "IGSF21"       "IGSF3"        "IGKV1OR1-1"   "IGSF9"        "IGSF8"        "IGFN1"        "IGKV3OR2-268" "IGKC"         "IGKV1-12"     "IGKV1-8"      "IGKJ5"        "IGKJ4"        "IGKJ3"        "IGKJ2"        "IGKJ1"        "IGKV4-1"      "IGKV5-2"      "IGKV7-3"      "IGKV2-4"      "IGKV1-5"      "IGKV1-6"      "IGKV3-7"      "IGKV1-9"      "IGKV2-10"     "IGKV3-11"     "IGKV1-13"     "IGKV2-14"     "IGKV3-15"     "IGKV1-16"     "IGKV1-17"     "IGKV2-18"     "IGKV2-19"     "IGKV3-20"     "IGKV6-21"     "IGKV1-22"     "IGKV2-23"     "IGKV2-24"     "IGKV3-25"     "IGKV2-26"     "IGKV1-27"     "IGKV2-28"     "IGKV2-29"     "IGKV2-30"     "IGKV3-31"     "IGKV1-32"     "IGKV1-33"     "IGKV3-34"     "IGKV1-35"     "IGKV2-36"     "IGKV1-37"     "IGKV2-38"     "IGKV1-39"     "IGKV2-40"     "IGKV2D-40"    "IGKV1D-39"    "IGKV2D-38"    "IGKV1D-37"    "IGKV2D-36"    "IGKV1D-35"    "IGKV3D-34"    "IGKV1D-33"    "IGKV1D-32"    "IGKV3D-31"    "IGKV2D-30"    "IGKV2D-29"    "IGKV2D-28"    "IGKV1D-27"    "IGKV2D-26"    "IGKV3D-25"    "IGKV2D-24"    "IGKV2D-23"    "IGKV1D-22"    "IGKV6D-21"    "IGKV3D-20"    "IGKV2D-19"    "IGKV2D-18"    "IGKV6D-41"    "IGKV1D-17"    "IGKV1D-16"   
 [80] "IGKV3D-15"    "IGKV3D-11"    "IGKV2D-14"    "IGKV1D-13"    "IGKV1D-12"    "IGKV2D-10"    "IGKV1D-42"    "IGKV1D-43"    "IGKV1D-8"     "IGKV3D-7"     "IGKV1OR2-118" "IGKV1OR2-1"   "IGKV2OR2-1"   "IGKV2OR2-2"   "IGKV1OR2-3"   "IGKV1OR2-9"   "IGKV2OR2-10"  "IGKV2OR2-7D"  "IGKV3OR2-5"   "IGKV1OR2-6"   "IGKV2OR2-7"   "IGKV2OR2-8"   "IGKV1OR2-11"  "IGKV1OR2-108" "IGFBP2"       "IGFBP5"       "IGSF11"       "IGSF11-AS1"   "IGSF10"       "IGF2BP2"      "IGF2BP2-AS1"  "IGFBP7"       "IGFBP7-AS1"   "IGIP"         "IGF2R"        "IGF2BP3"      "IGFBP1"       "IGFBP3"       "IGBP1"        "IGBP1-AS2"    "IGBP1-AS1"    "IGSF1"        "IGLV8OR8-1"   "IGHEP2"       "IGFBPL1"      "IGKV1OR9-2"   "IGKV1OR-2"    "IGKV1OR9-1"   "IGKV1OR-3"    "IGF2"         "IGF2-AS"      "IGSF22"       "IGHMBP2"      "IGSF9B"       "IGKV1OR10-1"  "IGFBP6"       "IGF1"         "IGHA2"        "IGHE"         "IGHG4"        "IGHG2"        "IGHGP"        "IGHA1"        "IGHEP1"       "IGHG1"        "IGHG3"        "IGHD"         "IGHM"         "IGHJ6"        "IGHJ3P"       "IGHJ5"        "IGHJ4"        "IGHJ3"        "IGHJ2P"       "IGHJ2"        "IGHJ1"        "IGHD7-27"     "IGHJ1P"       "IGHD1-26"    
[159] "IGHD6-25"     "IGHD5-24"     "IGHD4-23"     "IGHD3-22"     "IGHD2-21"     "IGHD1-20"     "IGHD6-19"     "IGHD5-18"     "IGHD4-17"     "IGHD3-16"     "IGHD2-15"     "IGHD1-14"     "IGHD6-13"     "IGHD5-12"     "IGHD4-11"     "IGHD3-10"     "IGHD3-9"      "IGHD2-8"      "IGHD1-7"      "IGHD6-6"      "IGHD5-5"      "IGHD4-4"      "IGHD3-3"      "IGHD2-2"      "IGHD1-1"      "IGHV6-1"      "IGHVII-1-1"   "IGHV1-2"      "IGHVIII-2-1"  "IGHV1-3"      "IGHV4-4"      "IGHV2-5"      "IGHVIII-5-1"  "IGHVIII-5-2"  "IGHV3-6"      "IGHV3-7"      "IGHV3-11"     "IGHVIII-11-1" "IGHV1-12"     "IGHV3-13"     "IGHVIII-13-1" "IGHV1-14"     "IGHV3-15"     "IGHVII-15-1"  "IGHV3-16"     "IGHV1-17"     "IGHV1-18"     "IGHV3-19"     "IGHV3-20"     "IGHV3-21"     "IGHV3-22"     "IGHVII-22-1"  "IGHVIII-22-2" "IGHV3-23"     "IGHV1-24"     "IGHV3-25"     "IGHVIII-25-1" "IGHV2-26"     "IGHVIII-26-1" "IGHVII-26-2"  "IGHV7-27"     "IGHV4-28"     "IGHV3-29"     "IGHV3-30"     "IGHVII-30-1"  "IGHV3-30-2"   "IGHV4-31"     "IGHVII-31-1"  "IGHV3-32"     "IGHV3-33"     "IGHVII-33-1"  "IGHV3-33-2"   "IGHV4-34"     "IGHV7-34-1"   "IGHV3-35"     "IGHV3-36"     "IGHV3-37"     "IGHV3-38"     "IGHVIII-38-1"
[238] "IGHV4-39"     "IGHV7-40"     "IGHVII-40-1"  "IGHV3-41"     "IGHV3-42"     "IGHV3-43"     "IGHVIII-44"   "IGHVIV-44-1"  "IGHVII-44-2"  "IGHV1-45"     "IGHV1-46"     "IGHVII-46-1"  "IGHV3-47"     "IGHVIII-47-1" "IGHV3-48"     "IGHV3-49"     "IGHVII-49-1"  "IGHV3-50"     "IGHV5-51"     "IGHVIII-51-1" "IGHVII-51-2"  "IGHV3-52"     "IGHV3-53"     "IGHVII-53-1"  "IGHV3-54"     "IGHV4-55"     "IGHV7-56"     "IGHV3-57"     "IGHV1-58"     "IGHV4-59"     "IGHV3-60"     "IGHVII-60-1"  "IGHV4-61"     "IGHV3-64"     "IGHV3-62"     "IGHVII-62-1"  "IGHV3-63"     "IGHV3-65"     "IGHVII-65-1"  "IGHV3-66"     "IGHV1-67"     "IGHVII-67-1"  "IGHVIII-67-2" "IGHVIII-67-3" "IGHVIII-67-4" "IGHV1-68"     "IGHV1-69"     "IGHV2-70"     "IGHV1-69-2"   "IGHV2-70-1"   "IGHV3-71"     "IGHV3-72"     "IGHV3-73"     "IGHV3-74"     "IGHVII-74-1"  "IGHV3-75"     "IGHV3-76"     "IGHVIII-76-1" "IGHV5-78"     "IGHVII-78-1"  "IGHV3-79"     "IGHV7-81"     "IGHVIII-82"   "IGHV1OR15-9"  "IGHV1OR15-2"  "IGHV3OR15-7"  "IGHD5OR15-5A" "IGHD4OR15-4A" "IGHD3OR15-3A" "IGHD2OR15-2A" "IGHD1OR15-1A" "IGHV1OR15-6"  "IGHD5OR15-5B" "IGHD4OR15-4B" "IGHD3OR15-3B" "IGHD2OR15-2B" "IGHD1OR15-1B" "IGHV1OR15-1"  "IGHV1OR15-3" 
[317] "IGHV4OR15-8"  "IGHV1OR15-4"  "IGDCC3"       "IGDCC4"       "IGF1R"        "IGFALS"       "IGSF6"        "IGHV1OR16-1"  "IGHV1OR16-3"  "IGHV3OR16-9"  "IGHV2OR16-5"  "IGHV3OR16-15" "IGHV3OR16-6"  "IGHV1OR16-2"  "IGHV3OR16-10" "IGHV1OR16-4"  "IGHV3OR16-8"  "IGHV3OR16-12" "IGHV3OR16-13" "IGHV3OR16-11" "IGHV3OR16-7"  "IGFBP4"       "IGF2BP1"      "IGLJCOR18"    "IGFLR1"       "IGSF23"       "IGFL4"        "IGFL3"        "IGFL2"        "IGFL1"        "IGLON5"       "IGKV1OR22-5"  "IGKV2OR22-4"  "IGKV2OR22-3"  "IGKV3OR22-2"  "IGKV1OR22-1"  "IGLVI-70"     "IGLV4-69"     "IGLVI-68"     "IGLV10-67"    "IGLVIV-66-1"  "IGLVV-66"     "IGLVIV-65"    "IGLVIV-64"    "IGLVI-63"     "IGLV1-62"     "IGLV8-61"     "IGLV4-60"     "IGLVIV-59"    "IGLVV-58"     "IGLV6-57"     "IGLVI-56"     "IGLV11-55"    "IGLV10-54"    "IGLVIV-53"    "IGLV5-52"     "IGLV1-51"     "IGLV1-50"     "IGLV9-49"     "IGLV5-48"     "IGLV1-47"     "IGLV7-46"     "IGLV5-45"     "IGLV1-44"     "IGLV7-43"     "IGLVI-42"     "IGLVVII-41-1" "IGLV1-41"     "IGLV1-40"     "IGLVI-38"     "IGLV5-37"     "IGLV1-36"     "IGLV7-35"     "IGLV2-34"     "IGLV2-33"     "IGLV3-32"     "IGLV3-31"     "IGLV3-30"     "IGLV3-29"    
[396] "IGLV2-28"     "IGLV3-27"     "IGLV3-26"     "IGLVVI-25-1"  "IGLV3-25"     "IGLV3-24"     "IGLV2-23"     "IGLVVI-22-1"  "IGLV3-22"     "IGLV3-21"     "IGLVI-20"     "IGLV3-19"     "IGLV2-18"     "IGLV3-17"     "IGLV3-16"     "IGLV3-15"     "IGLV2-14"     "IGLV3-13"     "IGLV3-12"     "IGLV2-11"     "IGLV3-10"     "IGLV3-9"      "IGLV2-8"      "IGLV3-7"      "IGLV3-6"      "IGLV2-5"      "IGLV3-4"      "IGLV4-3"      "IGLV3-2"      "IGLV3-1"      "IGLL5"        "IGLJ1"        "IGLC1"        "IGLJ2"        "IGLC2"        "IGLJ3"        "IGLC3"        "IGLJ4"        "IGLC4"        "IGLJ5"        "IGLC5"        "IGLJ6"        "IGLC6"        "IGLJ7"        "IGLC7"        "IGLL1"        "IGLVIVOR22-1" "IGLCOR22-1"   "IGLCOR22-2"   "IGLVIVOR22-2" "IGHV1OR21-1"  "IGSF5"       
se <- PercentageFeatureSet(se, pattern = "^IG", col.name = "perc.ig")

Check these percentage across samples

# Plot features by Sample together
VlnPlot(se,
  group.by = "sample_id",
  features = c("perc.mt", "perc.ribo", "perc.hb", "perc.ig"),
  pt.size = 0,
  ncol = 2
  ) &
    NoLegend() &
    scale_fill_manual(values = donor_pal)

Looking at these distributions as a stand alone we can see how there seems to be an inverse relation between mitochondrial and ribosomal proportion.

Feature Covariation

Looking at how QC features covary between them is very important. It gives us a better sense of what is going on and allows us to make better decisions.

This covariation plot colors by a specified variable in order to see its prevalence across the data. Below, the distribution of mitochondrial percentage across the dataset is shown in relation to the library size and complexity.

# Percent Mitochondrial Genes
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.mt)) +
    geom_point() +
    theme_classic() +
    scale_color_gradient(low = "yellow", high = "red")

# Percent MT Genes per sample
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.mt)) +
  geom_point() +
  facet_wrap(~sample_id) +
  theme_classic() +
  scale_color_gradient(low = "yellow", high = "red") +
  scale_x_continuous(
        labels = scales::unit_format(unit = "K", scale = 1e-3))

It is not strange that we lack cells with high mitochondrial percentage since the authors mention they filter out cells with >15% mitochondrial genes. This explains the cap we see at that range above.

Below we look at the distribution of ribosomal expression. Ribosomal percentage can be helpful when studying tumors as tumoral cells tend to have higher ribosomal percentage.

# Percent Ribosomal Genes
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.ribo)) +
    geom_point() +
    theme_classic() +
    scale_color_gradient(low = "yellow", high = "red")

# Percent MT Genes per sample
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.ribo)) +
  geom_point() +
  facet_wrap(~`sample_id`) +
  theme_classic() +
  scale_color_gradient(low = "yellow", high = "red") +
  scale_x_continuous(
        labels = scales::unit_format(unit = "K", scale = 1e-3))

Lastly, let’s plot the immunoglobulin % distribution.

# Percent Hemoglobin Genes
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.ig)) +
  geom_point() +
  theme_classic() +
  scale_color_gradient(low = "yellow", high = "red")

# Percent hemoglobin Genes per sample
se@meta.data %>%
  ggplot(., aes(x = nCount_RNA, y = nFeature_RNA, color = perc.ig)) +
  geom_point(alpha = 0.5) +
  facet_wrap(~`sample_id`) +
  theme_classic() +
  scale_color_gradient(low = "yellow", high = "red") +
  scale_x_continuous(
        labels = scales::unit_format(unit = "K", scale = 1e-3))

We can clearly see how the cells in the belly of the curve tend to have a higher immunoglobulin % suggesting they are Plasma Cells.

Add Batch Information

Preprocessed Seurat objects will often contain batch and sample information. From the paper for this dataset, the batch information is found in a supplementary table and the authors claimed most doublets were classified as “Uncategorized”. Because they did not include the doublet information in the metadata, we will add it in this section.

Packages to find doublets in both R and Python run one batch at a time as doublets are only introduced as a technical error at the 10X GEM run level during the encapsulation of cells within oil droplets. We imported the batch information so we could subset the larger object and filter out doublets per batch. Doublet detection algorithms must be run by 10X GEM run!

table(se$batch, se$sample_id)
                 
                  T024 T036  T44 T057 T110 T160 T161 T182 T017 T019 T176 T189 T197 T203 T202
  4918STDY7273964    0    0    0    0    0    0    0    0 1008    0    0    0    0    0    0
  4918STDY7273965    0    0    0    0    0    0    0    0    0 1602    0    0    0    0    0
  4918STDY7274839 1065    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  4918STDY7333456    0 2582    0    0    0    0    0    0    0    0    0    0    0    0    0
  4918STDY7389431    0    0 1499    0    0    0    0    0    0    0    0    0    0    0    0
  4918STDY7447825    0    0    0 1911    0    0    0    0    0    0    0    0    0    0    0
  4918STDY7702680    0    0    0    0 1143    0    0    0    0    0    0    0    0    0    0
  4918STDY7844898    0    0    0    0    0  891    0    0    0    0    0    0    0    0    0
  4918STDY7844899    0    0    0    0    0    0  957    0    0    0    0    0    0    0    0
  4918STDY7901096    0    0    0    0    0    0    0    0    0    0 3430    0    0    0    0
  4918STDY7923744    0    0    0    0    0    0    0 1254    0    0    0    0    0    0    0
  4918STDY7933270    0    0    0    0    0    0    0    0    0    0    0  741    0    0    0
  4918STDY7962470    0    0    0    0    0    0    0    0    0    0    0    0 2360    0    0
  4918STDY7982702    0    0    0    0    0    0    0    0    0    0    0    0    0  740    0
  4918STDY7982703    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1319

In this dataset we can see how each participant was processed in the same batch.

DoubletFinder

Doublet detection and removal is sensitive because you don’t want to remove any valuable outliers but still take into consideration the noisiness of the data. There are many different packages that can be used to detect doublets. If you are familiar with Python, we recommend Scrublet. But for the sake of staying in one programming language, DoubletFinder was found to be one of the most accurate R packages.

DoubletFinder uses a nearest neighbor approach to identify doublets. The package has a function called doubletFinder that takes in a Seurat object and returns a Seurat object with doublet information added in the metadata.

How it Works

  1. DoubletFinder combines random cell’s gene expression profiles to create “fake doublets.”
  2. Artificial doublets are inserted into the data.
  3. Dimensionality reduction and a KNN graph is built with our cells and the fake doublets.
  4. On the k-nearest neighbor graph, each cell has k most-similar neighbors. pANN is the proportion of each cell’s neighbors that are “fake doublets”.
  5. A high pANN value indicates that a larger proportion of the cells around a specific cell in the kNN graph are “fake doublets” suggesting that the cell has a transcriptome similar to a doublet.

Cells in transitioning states or cells with mixed phenotypes can be mistaken for doublets. It is important not to disregard cells just because they have a large doublet score (pANN value), there might be interesting biological information in these cells!

DoubletFinder allows you to set a threshold and will give each cell a categorical result of ‘Singlet’ or ‘Doublet’. We suggest looking at the raw pANN values to understand the distribution of these cells and make threshold adjustments accordingly.

Parameters

pK parameter: PC neighborhood size that is used to compute pANN, and it is expressed as a proportion of the merged real-artificial data. This metric determines the accuracy of the classification model. The higher the pK, the more stringent the model is in classifying doublets. The lower the pK, the more lenient the model is in classifying doublets.

pN parameter: default number of doublets we expect to find (default = 0.25). pN functions as the threshold.

pANN: proportion of artificial nearest neighbors or the “doublet score”.

Note: DoubletFinder is only sensitive to heterotypic doublets (transcriptionally-distinct doublets) so the developer suggests using a cell-type annotations (here: batch_seurat_object@meta.data$seurat_clusters) to model the expected proportion of homotypic (transcriptionally-similar) doublets.

library(DoubletFinder)
exp_btch <- unique(se@meta.data$batch)
# Run doubletFinder
process_batch <- function(batch) {
  print(batch)

  # 1 - Subset Seurat object by batch
  batch_seurat_object <- subset(se, cells = which(se@meta.data$batch == batch))

  # 2 - Process count data of that subseted Seurat object
  batch_seurat_object <- batch_seurat_object %>%
    NormalizeData() %>%
    FindVariableFeatures() %>%
    ScaleData() %>%
    RunPCA()

  # 3 - nExp defines the expected pANN threshold used to make final doublet-decisions
  # annotations <- batch_seurat_object$Celltype
  # homotypic.prop <- modelHomotypic(annotations)
  nExp_poi <- round(0.075 * nrow(se@meta.data))  # Assuming 7.5% doublet formation rate
  # nExp_poi.adj <- round(nExp_poi * (1 - homotypic.prop))

  # 4 - Run doubletFinder
  batch_seurat_object <- doubletFinder(
    batch_seurat_object,
    PCs = 1:20,
    pN = 0.1,
    pK = 50 / ncol(batch_seurat_object), # set a k to aproximately 50 cells per neighborhood for pANN
    nExp = nExp_poi,
    reuse.pANN = FALSE, # If TRUE, pANN will be reused from a previous run
    sct = FALSE) # If TRUE, sctransform will be used for preprocessing

  # 7 - Format doublet information for return dataframe
  pANN <- colnames(batch_seurat_object@meta.data) %>%
    keep(str_detect(colnames(batch_seurat_object@meta.data), '^pANN*')) # pANN_0.25_0.3_1000
  DF_class <- colnames(batch_seurat_object@meta.data) %>%
    keep(str_detect(colnames(batch_seurat_object@meta.data), '^DF.classifications*')) # DF.classifications_0.25_0.3_1000
  ## Extract values from pANN variable name
  params <- gsub("^pANN_", "", pANN)  # Remove "pANN_"
  params <- strsplit(params, "_")[[1]]  # Split by "_"
  pN <- params[1]  # Extract "0.1"
  pK <- params[2]  # Extract "0.1"
  doublet_run <- params[3]  # Extract "1000"

  ## Create new columns "pN", "pK", and "doublet_run"
  colnames(batch_seurat_object@meta.data)[colnames(batch_seurat_object@meta.data) == pANN] <- "pANN"
  colnames(batch_seurat_object@meta.data)[colnames(batch_seurat_object@meta.data) == DF_class] <- "DF_class"
  batch_seurat_object@meta.data$pN <- pN
  batch_seurat_object@meta.data$pK <- pK
  batch_seurat_object@meta.data$doublet_run <- doublet_run

  # 8 - Return DF of doublet information here
  df <- data.frame(
    Barcode = colnames(batch_seurat_object),
    batch = batch_seurat_object@meta.data$batch,
    pANN = batch_seurat_object@meta.data$'pANN',
    DF_class = batch_seurat_object@meta.data$'DF_class',
    doublet_run = batch_seurat_object@meta.data$doublet_run,
    pK = batch_seurat_object@meta.data$pK,
    pN = batch_seurat_object@meta.data$pN
    )
  rownames(df) <- df$Barcode
  return(df)

}

processed_doublets <- lapply(exp_btch, process_batch)
[1] 4918STDY7273964
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 112 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7273965
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 178 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7274839
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 118 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7333456
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 287 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7389431
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 167 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7447825
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 212 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7702680
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 127 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7844898
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 99 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7844899
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 106 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7901096
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 381 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7923744
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 139 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7933270
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 82 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7962470
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 262 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7982702
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 82 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
[1] 4918STDY7982703
Levels: 4918STDY7273964 4918STDY7273965 4918STDY7274839 4918STDY7333456 4918STDY7389431 4918STDY7447825 4918STDY7702680 4918STDY7844898 4918STDY7844899 4918STDY7901096 4918STDY7923744 4918STDY7933270 4918STDY7962470 4918STDY7982702 4918STDY7982703
[1] "Creating 147 artificial doublets..."
[1] "Creating Seurat object..."
[1] "Normalizing Seurat object..."
[1] "Finding variable genes..."
[1] "Scaling data..."
[1] "Running PCA..."
[1] "Calculating PC distance matrix..."
[1] "Computing pANN..."
[1] "Classifying doublets.."
# Add doublet information to the main seurat object
dbl_info <- bind_rows(processed_doublets)
# Save Doublet information in Seurat object
saveRDS(dbl_info, '../data/doublet_df.rds')
### Load Pre-Run DF object
dbl_info <- readRDS('../data/doublet_df.rds')
dbl_info <- dbl_info %>% select(-Barcode)
se <- AddMetaData(
    object = se, 
    metadata = dbl_info
  )

Perform Preprocessing

se <- se %>%
    NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(
        method = "vst",
        nfeatures = 3000,
        verbose = FALSE) %>%
    ScaleData(verbose = FALSE, features = VariableFeatures(.)) %>%
    RunPCA(verbose = FALSE) %>%
    RunUMAP(dims = 1:30, verbose = FALSE)
# Plot UMAP colored by batch
DimPlot(se, reduction = "umap", group.by = 'DF_class') +
  labs(title = "UMAP Plot Colored by Singlet/Doublet classification")

Let’s look at the distribution of pANN values

# Plot UMAP by pANN
p1 <- FeaturePlot(se, reduction = "umap", features = "pANN") +
  labs(title = "UMAP Plot Colored by pANN") 

p1

# DimPlot(se, reduction = "umap", group.by = "Celltype") +
#   labs(title = "UMAP Plot Colored by Batch")

We can see how Uncategorized 2 has a high pANN but also intermediate Monocytes. As mentioned in the paper the doublets in this dataset are annotated as Uncategorized which makes sense. However, those cells labeled as intermediate monocyte could be a case where we have a cell with a mixed phenotype of two other cell types in the dataset (classical and nonclassical monocytes). This leads to that population having a high pANN score without them being a doublet.

Filter out cells

Label cells as low quality according to three metrics:

  1. Any cell with less that 500 UMIs

  2. Any cell with less than 200 genes

  3. Any cell with more than 15% mitochondrial genes

se@meta.data <- se@meta.data %>%
  mutate(
    quality = if_else(
        nFeature_RNA > 200 & nCount_RNA > 500 & perc.mt < 15, "good quality", "bad quality"),
    quality = factor(quality, levels = c("bad quality", "good quality"))
    )
s2 <- FeaturePlot(se, reduction = "umap", features = "pANN") +
  labs(title = "UMAP Plot Colored by pANN") +
  scale_fill_gradient(limits = c(0, 1))
# Plot UMAP of good quality cells
s3 <- DimPlot(se, reduction = "umap", group.by = "quality") +
  labs(title = "UMAP Plot Colored by Cell Quality") 
s4 <- DimPlot(se, reduction = "umap", group.by = "sample_id") +
  labs(title = "UMAP Plot Colored by Sample ID")
s2 | s3 | s4

Lack of overlap between pANN dist and doublet dist is expected since pANN identifies potential doublets (expected to have higher library size and complexity) while the cell quality filter aims to identify the opposite pattern.

We will keep the potential doublets in the data because they may be biologically relevant.

Differential Gene Expression comparison between good and bad quality cells

Lastly, we carry out a differential gene expression analysis between poor and good quality cells. This is aimed at making sure that the cells labelled as “bad quality” don’t make up a specific cell type that happens to have small library size and complexity. Ideally, when removing these cells we expect to see that the genes differentially expressed in “bad quality” are just mitochndrial ones.

# Using the non-batched 'discard' vector for demonstration purposes,
# as it has more cells for stable calculation of 'lost'.
discard <- se$quality == "bad quality"
lost <- sparseMatrixStats::rowMeans2(se@assays$RNA$counts[, discard])
kept <- sparseMatrixStats::rowMeans2(se@assays$RNA$counts[, !discard])

library(edgeR)
logged <- cpm(cbind(lost, kept), log = TRUE, prior.count = 2)
logFC <- logged[,1] - logged[,2]
abundance <- rowMeans(logged)

df <- data.frame(logFC, abundance, gene = row.names(se)) %>%
    arrange(desc(logFC)) %>%
    mutate(rank = row_number()) %>%
    mutate(gene_txt = if_else(rank <= 30 | rank > n() - 30, gene, NA_character_))
# Assuming you have a data frame with logFC and abundance
df %>%
    mutate(
        gene_type = case_when(
            str_detect(gene, pattern = "^MT-") ~ "mt",
            str_detect(gene, pattern = "^RPS|^RPL") ~ "rb",
            str_detect(gene, pattern = "^HB[^(P)]") ~ "hb",
            TRUE ~ "other"
        )
    ) %>%
    ggplot(aes(x = abundance, y = logFC, color = gene_type, label = gene_txt)) +
    geom_point() +
    ggrepel::geom_text_repel() +
    geom_hline(yintercept = 1, color = "red", linetype = "dashed") +
    geom_hline(yintercept = -1, color = "red", linetype = "dashed") +
    annotate(
        "segment",
        x = 12, xend = 12,
        y = 1, yend = 3,
        color = "blue",
        arrow = arrow(length = unit(0.2, "cm"))) +
    annotate(
        "text",
        x = 13, y = 2,
        label = "Overexpressed in poor quality cells",
        vjust = -1,
        hjust = 0.3,
        color = "blue") +
    annotate(
        "segment",
        x = 12, xend = 12,
        y = -Inf, yend = -1,
        color = "blue",
        arrow = arrow(length = unit(0.2, "cm"), ends = "first")) +
    annotate(
        "text",
        x = 12, y = -1.5,
        label = "Overexpressed in good quality cells",
        vjust = 1,
        hjust = -0.1,
        color = "blue") +
    theme_classic() + 
    labs(
        title = "Average Gene Expression vs. Log2 Fold Change",
        x = "Average Gene Expression",
        y = "Log2 Fold Change") 

Some genes are differentially expressed in the “poor quality” cells group. We can see how mitochondrial genes are some of them but also many APO and FABP genes involved in lipid transport, IL32 whose expression is elevated in epithelial cells in inflammatory settings…

Resave Seurat object

saveRDS(object = se, file = '../data/se_qc.rds')

Session Info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_4.0.16         limma_3.58.1         KernSmooth_2.23-26   fields_16.3          viridisLite_0.4.2    spam_2.11-0          ggridges_0.5.6       DoubletFinder_2.0.4  colorBlindness_0.1.9 RColorBrewer_1.1-3   lubridate_1.9.4      forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2          readr_2.1.5          tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.1        tidyverse_2.0.0      Seurat_5.2.1         SeuratObject_5.0.2   sp_2.1-4            

loaded via a namespace (and not attached):
  [1] rstudioapi_0.17.1        jsonlite_1.8.9           magrittr_2.0.3           spatstat.utils_3.1-2     ggbeeswarm_0.7.2         farver_2.1.2             rmarkdown_2.29           vctrs_0.6.5              ROCR_1.0-11              spatstat.explore_3.3-4   htmltools_0.5.8.1        gridGraphics_0.5-1       sctransform_0.4.1        parallelly_1.41.0        htmlwidgets_1.6.4        ica_1.0-3                plyr_1.8.9               plotly_4.10.4            zoo_1.8-12               igraph_2.1.2             mime_0.12                lifecycle_1.0.4          pkgconfig_2.0.3          Matrix_1.6-5             R6_2.5.1                 fastmap_1.2.0            MatrixGenerics_1.14.0    fitdistrplus_1.2-2       future_1.34.0            shiny_1.10.0             digest_0.6.37            colorspace_2.1-1         patchwork_1.3.0          tensor_1.5               RSpectra_0.16-2          irlba_2.3.5.1            labeling_0.4.3           progressr_0.15.1         spatstat.sparse_3.1-0    timechange_0.3.0         httr_1.4.7               polyclip_1.10-7          abind_1.4-8              compiler_4.3.1           withr_3.0.2              fastDummies_1.7.5        maps_3.4.2.1            
 [48] MASS_7.3-60              tools_4.3.1              vipor_0.4.7              lmtest_0.9-40            beeswarm_0.4.0           httpuv_1.6.15            future.apply_1.11.3      goftest_1.2-3            glue_1.8.0               nlme_3.1-166             promises_1.3.2           grid_4.3.1               Rtsne_0.17               cluster_2.1.8            reshape2_1.4.4           generics_0.1.3           gtable_0.3.6             spatstat.data_3.1-4      tzdb_0.4.0               data.table_1.16.4        hms_1.1.3                spatstat.geom_3.3-4      RcppAnnoy_0.0.22         ggrepel_0.9.6            RANN_2.6.2               pillar_1.10.1            RcppHNSW_0.6.0           later_1.4.1              splines_4.3.1            lattice_0.22-6           survival_3.8-3           deldir_2.0-4             tidyselect_1.2.1         locfit_1.5-9.10          miniUI_0.1.1.1           pbapply_1.7-2            knitr_1.49               gridExtra_2.3            scattermore_1.2          xfun_0.50                statmod_1.5.0            matrixStats_1.5.0        stringi_1.8.4            lazyeval_0.2.2           yaml_2.3.10              evaluate_1.0.3           codetools_0.2-20        
 [95] cli_3.6.3                uwot_0.1.16              xtable_1.8-4             reticulate_1.40.0        munsell_0.5.1            Rcpp_1.0.14              globals_0.16.3           spatstat.random_3.3-2    png_0.1-8                ggrastr_1.0.2            spatstat.univar_3.1-1    parallel_4.3.1           dotCall64_1.2            sparseMatrixStats_1.14.0 listenv_0.9.1            scales_1.3.0             rlang_1.1.4              cowplot_1.1.3